智能论文笔记

Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Tomoki Ando , Hiroto Iino , Hiroki Mori , Ryota Torishima , Kuniyuki Takahashi , Shoichiro Yamaguchi , Daisuke Okanohara , Tetsuya Ogata

分类：机器人

2022-02-26

我们提出了一种使用条件生成对抗网络（CGANS）在机器人关节空间和潜在空间之间转换的新方法，以进行无碰撞路径计划，该方法仅捕获以障碍物图来捕获关节空间的无碰撞区域。操纵机器人臂时，很方便地生成多个合理的轨迹进行进一步选择。此外，出于安全原因，有必要生成轨迹，以避免与机器人本身或周围环境发生碰撞。在提出的方法中，可以通过将开始和目标状态与此生成的潜在空间中的任意线段连接起来和目标状态来产生各种轨迹。我们的方法提供了此无碰撞潜在空间，此后，任何使用任何优化条件的计划者都可以使用任何计划器来生成最合适的路径。我们通过模拟和实际的UR5E 6-DOF机器人臂成功验证了这种方法。我们确认可以根据优化条件的选择生成不同的轨迹。

translated by 谷歌翻译

Target-mass Grasping of Entangled Food using Pre-grasping & Post-grasping

Kuniyuki Takahashi , Naoki Fukaya , Avinash Ummadisingu

分类：机器人

2022-01-04

食品包装行业通常使用工厂工人手动包装的季节性成分。对于由体积或重量挑选的小型食物，倾向于使缠绕，棒或聚集在一起，很难预测他们从视觉检查中有多么交流，使其成为准确掌握必要目标大量的挑战。工人依赖于称重鳞片的组合和一系列复杂的操作，以分离食物并达到目标质量。这使得过程自动化是非琐碎的事件。在这项研究中，我们提出了一种结合1）预先抓住以降低缠结程度的方法，2）在掌握量大于掌握量时仔细丢弃多余的食物以调整抓住质量的缠绕。目标质量和3）选择抓取点以抓住可能合理地高于目标抓地质量的量。我们评估了各种食品的方法，缠绕，粘和丛的各种食物，每个食物具有不同的尺寸，形状和材料特性，例如体积质量密度。我们使用我们所提出的方法表现出掌握用户指定目标群众的准确性的显着改进。

translated by 谷歌翻译

Point Cloud-based Proactive Link Quality Prediction for Millimeter-wave Communications

Shoki Ohta , Takayuki Nishio , Riichi Kudo , Kahoko Takahashi , Hisashi Nagata

分类：人工智能 | 计算机视觉 | 机器学习

2023-01-02

This study demonstrates the feasibility of point cloud-based proactive link quality prediction for millimeter-wave (mmWave) communications. Image-based methods to quantitatively and deterministically predict future received signal strength using machine learning from time series of depth images to mitigate the human body line-of-sight (LOS) path blockage in mmWave communications have been proposed. However, image-based methods have been limited in applicable environments because camera images may contain private information. Thus, this study demonstrates the feasibility of using point clouds obtained from light detection and ranging (LiDAR) for the mmWave link quality prediction. Point clouds represent three-dimensional (3D) spaces as a set of points and are sparser and less likely to contain sensitive information than camera images. Additionally, point clouds provide 3D position and motion information, which is necessary for understanding the radio propagation environment involving pedestrians. This study designs the mmWave link quality prediction method and conducts two experimental evaluations using different types of point clouds obtained from LiDAR and depth cameras, as well as different numerical indicators of link quality, received signal strength and throughput. Based on these experiments, our proposed method can predict future large attenuation of mmWave link quality due to LOS blockage by human bodies, therefore our point cloud-based method can be an alternative to image-based methods.

translated by 谷歌翻译

Out-of-Distribution Detection with Reconstruction Error and Typicality-based Penalty

Genki Osada , Takahashi Tsubasa , Budrul Ahsan , Takashi Nishide

分类：机器学习 | 计算机视觉

2022-12-24

The task of out-of-distribution (OOD) detection is vital to realize safe and reliable operation for real-world applications. After the failure of likelihood-based detection in high dimensions had been shown, approaches based on the \emph{typical set} have been attracting attention; however, they still have not achieved satisfactory performance. Beginning by presenting the failure case of the typicality-based approach, we propose a new reconstruction error-based approach that employs normalizing flow (NF). We further introduce a typicality-based penalty, and by incorporating it into the reconstruction error in NF, we propose a new OOD detection method, penalized reconstruction error (PRE). Because the PRE detects test inputs that lie off the in-distribution manifold, it effectively detects adversarial examples as well as OOD examples. We show the effectiveness of our method through the evaluation using natural image datasets, CIFAR-10, TinyImageNet, and ILSVRC2012.

translated by 谷歌翻译

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Hao-Wen Dong , Naoya Takahashi , Yuki Mitsufuji , Julian McAuley , Taylor Berg-Kirkpatrick

分类：计算机视觉

2022-12-14

Recent years have seen progress beyond domain-specific sound separation for speech or music towards universal sound separation for arbitrary sounds. Prior work on universal sound separation has investigated separating a target sound out of an audio mixture given a text query. Such text-queried sound separation systems provide a natural and scalable interface for specifying arbitrary target sounds. However, supervised text-queried sound separation systems require costly labeled audio-text pairs for training. Moreover, the audio provided in existing datasets is often recorded in a controlled environment, causing a considerable generalization gap to noisy audio in the wild. In this work, we aim to approach text-queried universal sound separation by using only unlabeled data. We propose to leverage the visual modality as a bridge to learn the desired audio-textual correspondence. The proposed CLIPSep model first encodes the input query into a query vector using the contrastive language-image pretraining (CLIP) model, and the query vector is then used to condition an audio separation model to separate out the target sound. While the model is trained on image-audio pairs extracted from unlabeled videos, at test time we can instead query the model with text inputs in a zero-shot setting, thanks to the joint language-image embedding learned by the CLIP model. Further, videos in the wild often contain off-screen sounds and background noise that may hinder the model from learning the desired audio-textual correspondence. To address this problem, we further propose an approach called noise invariant training for training a query-based sound separation model on noisy data. Experimental results show that the proposed models successfully learn text-queried universal sound separation using only noisy unlabeled videos, even achieving competitive performance against a supervised model in some settings.

translated by 谷歌翻译

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Shrutina Agarwal , Sriram Ganapathy , Naoya Takahashi

分类：机器学习

2022-08-26

在本文中，我们提出了一个模型，以执行语音转换为歌声。与以前的基于信号处理的方法相反，基于信号处理的方法需要高质量的唱歌模板或音素同步，我们探索了一种数据驱动的方法，即将自然语音转换为唱歌声音的问题。我们开发了一种新型的神经网络体系结构，称为Symnet，该结构将输入语音与目标旋律的一致性建模，同时保留了说话者的身份和自然性。所提出的符号模型由三种类型层的对称堆栈组成：卷积，变压器和自发层。本文还探讨了新的数据增强和生成损耗退火方法，以促进模型培训。实验是在NUS和NHSS数据集上进行的，这些数据集由语音和唱歌语音的平行数据组成。在这些实验中，我们表明所提出的SYMNET模型在先前发表的方法和基线体系结构上显着提高了客观重建质量。此外，主观听力测试证实了使用拟议方法获得的音频质量的提高（绝对提高了0.37的平均意见分数测度量度比基线系统）。

translated by 谷歌翻译

HTML版本

Training Process of Unsupervised Learning Architecture for Gravity Spy Dataset

Yusuke Sakai , Yousuke Itoh , Piljong Jung , Keiko Kokeyama , Chihiro Kozakai , Katsuko T. Nakahira , Shoichi Oshino , Yutaka Shikano , Hirotaka Takahashi , Takashi Uchiyama

分类： (统计)机器学习

2022-08-07

来自重力波检测器的数据中出现的瞬态噪声通常会引起问题，例如检测器的不稳定性以及重叠或模仿重力波信号。由于瞬态噪声被认为与环境和工具相关联，因此其分类将有助于理解其起源并改善探测器的性能。在先前的研究中，提出了用于使用时频2D图像（频谱图）进行瞬态噪声进行分类的体系结构，该架构使用了无监督的深度学习与变异自动编码器和不变信息集群的结合。提出的无监督学习结构应用于重力间谍数据集，该数据集由高级激光干涉仪重力波动台（Advanced Ligo）瞬态噪声与其相关元数据进行讨论，以讨论在线或离线数据分析的潜力。在这项研究的重点是重力间谍数据集中，研究并报告了先前研究的无监督学习结构的训练过程。

translated by 谷歌翻译

UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture

Hiroyasu Akada , Jian Wang , Soshi Shimada , Masaki Takahashi , Christian Theobalt , Vladislav Golyanik

分类：计算机视觉

2022-08-02

我们提出Unrealego，即，一种用于以Egentric 3D人类姿势估计的新的大规模自然主义数据集。Unrealego是基于配备两个鱼眼摄像机的眼镜的高级概念，可用于无约束的环境。我们设计了它们的虚拟原型，并将其附加到3D人体模型中以进行立体视图捕获。接下来，我们会产生大量的人类动作。结果，Unrealego是第一个在现有的EgeCentric数据集中提供最大动作的野外立体声图像的数据集。此外，我们提出了一种新的基准方法，其简单但有效的想法是为立体声输入设计2D关键点估计模块，以改善3D人体姿势估计。广泛的实验表明，我们的方法在定性和定量上优于先前的最新方法。Unrealego和我们的源代码可在我们的项目网页上找到。

translated by 谷歌翻译

RealTime QA: What's the Answer Right Now?

Jungo Kasai , Keisuke Sakaguchi , Yoichi Takahashi , Ronan Le Bras , Akari Asai , Xinyan Yu , Dragomir Radev , Noah A. Smith , Yejin Choi , Kentaro Inui

分类：自然语言处理

2022-07-27

我们介绍了Realtime QA，这是一个动态的问答（QA）平台，该平台宣布问题并定期评估系统（此版本每周）。实时质量检查询问当前世界，质量检查系统需要回答有关新事件或信息的问题。因此，它挑战了QA数据集中的静态，常规假设，并追求瞬时应用。我们在包括GPT-3和T5在内的大型语言模型上建立了强大的基线模型。我们的基准是一项持续的努力，该初步报告在过去一个月中提出了实时评估结果。我们的实验结果表明，GPT-3通常可以根据新的退休文档正确更新其生成结果，从而突出了最新信息检索的重要性。尽管如此，我们发现GPT-3倾向于在检索文件时返回过时的答案，这些文件没有提供足够的信息来找到答案。这表明了未来研究的重要途径：开放式域质量检查系统是否可以确定无法回答的案例，并与用户甚至检索模块进行通信以修改检索结果？我们希望实时质量检查能够刺激问题答案及其他问题的瞬时应用。

translated by 谷歌翻译

Switching One-Versus-the-Rest Loss to Increase the Margin of Logits for Adversarial Robustness

Sekitoshi Kanai , Shin'ya Yamaguchi , Masanori Yamada , Hiroshi Takahashi , Yasutoshi Ida

分类：机器学习 | 人工智能 | (统计)机器学习

2022-07-21

捍卫深层神经网络免受对抗性示例是AI安全的关键挑战。为了有效地提高鲁棒性，最近的方法集中在对抗训练中的决策边界附近的重要数据点上。但是，这些方法容易受到自动攻击的影响，这是无参数攻击的合奏，可用于可靠评估。在本文中，我们通过实验研究了其脆弱性的原因，发现现有方法会减少真实标签和其他标签的逻辑之间的利润，同时保持其梯度规范非微小值。减少的边缘和非微小梯度规范会导致其脆弱性，因为最大的logit可以轻松地被扰动翻转。我们的实验还表明，logit边缘的直方图具有两个峰，即小和大的logit边缘。从观察结果来看，我们提出了切换单重损失（SOVR），当数据具有较小的logit rumgins时，它会使用单重损失，从而增加边缘。我们发现，SOVR比现有方法增加了logit的利润率，同时使梯度规范保持较小，并且在针对自动攻击的鲁棒性方面超越了它们。

translated by 谷歌翻译